Skip to content

[codex] Apply generated cpflow GitHub Actions flow#732

Draft
justin808 wants to merge 29 commits intomasterfrom
codex/cpflow-github-actions-flow
Draft

[codex] Apply generated cpflow GitHub Actions flow#732
justin808 wants to merge 29 commits intomasterfrom
codex/cpflow-github-actions-flow

Conversation

@justin808
Copy link
Copy Markdown
Member

@justin808 justin808 commented Apr 30, 2026

Summary

  • Replace the older hand-written Control Plane GitHub Actions with the generated cpflow-* action and workflow set from cpflow generate-github-actions.
  • Keep review apps opt-in, staging deploys branch-gated, production promotion manual, and nightly review-app cleanup in the generated flow.
  • Update Control Plane docs links so they point at the new cpflow-* action/workflow names.

Verification

  • BUNDLE_GEMFILE=/tmp/cpflow-pr-278/Gemfile bin/conductor-exec bundle exec ruby /tmp/cpflow-pr-278/bin/cpflow github-flow-readiness — no blocking readiness issues detected.
  • bin/conductor-exec ruby -e 'require "yaml"; paths = Dir[".github/**/*.yml"] + Dir[".github/**/*.yaml"] + Dir[".controlplane/**/*.yml"]; paths.sort.each { |path| YAML.load_file(path, aliases: true); puts path }'
  • bin/conductor-exec git diff --cached --check
  • bin/conductor-exec bundle exec rubocop

Not Run / Blocked

  • bin/conductor-exec bundle exec rspec could not complete locally because PostgreSQL was not running at /tmp/.s.PGSQL.5432; after the first Rails load failure, Coveralls setup also reported repeated coverage initialization errors.

Note

Medium Risk
Touches CI/CD deployment and production promotion workflows; mistakes could block deployments or mis-handle secrets, though the changes add additional validation and safety checks.

Overview
Replaces the hand-written Control Plane GitHub Actions with the generated cpflow-* composite actions and workflows for review apps, staging deploys, production promotion, and nightly stale review-app cleanup.

The new flow makes review apps explicitly opt-in via exact /deploy-review-app comments (with automatic redeploys after creation) and adds stricter safeguards around secrets/trust boundaries (config validation, fork restrictions, safer token/SSH handling, and prefix-guarded deletes). Docs and internal team notes are updated to reference the new workflows and required repo secrets/variables.

Reviewed by Cursor Bugbot for commit 2c49140. Bugbot is set up for automated code reviews on this repo. Configure here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f52af42b-1cf6-46d0-91f8-a75fff73dea1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/cpflow-github-actions-flow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Upstream token not passed to copy-image command
    • Added the required -t upstream token argument to the cpflow copy-image-from-upstream command while keeping the token sourced from secrets via the environment.

Create PR

Or push these changes by commenting:

@cursor push 3606e16506
Preview (3606e16506)
diff --git a/.github/workflows/cpflow-promote-staging-to-production.yml b/.github/workflows/cpflow-promote-staging-to-production.yml
--- a/.github/workflows/cpflow-promote-staging-to-production.yml
+++ b/.github/workflows/cpflow-promote-staging-to-production.yml
@@ -203,14 +203,13 @@
 
       - name: Copy image from staging
         env:
-          # Pass the upstream token via env rather than `-t` so it doesn't appear in /proc/<pid>/cmdline.
           CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }}
           PRODUCTION_APP_NAME: ${{ vars.PRODUCTION_APP_NAME }}
           CPLN_ORG_PRODUCTION: ${{ vars.CPLN_ORG_PRODUCTION }}
         shell: bash
         run: |
           set -euo pipefail
-          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}"
+          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" -t "${CPLN_UPSTREAM_TOKEN}" --org "${CPLN_ORG_PRODUCTION}"
 
       - name: Deploy image to production
         env:

You can send follow-ups to the cloud agent here.

Comment thread .github/workflows/cpflow-promote-staging-to-production.yml Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: cpflow GitHub Actions Migration

This is a solid migration from hand-written to generated cpflow-* workflows. The overall architecture is a clear improvement. Here's a summary of what stands out.


Security — meaningful fixes

  • Token exposure fixed: The old setup-environment action passed the token as --token "$TOKEN" (visible in /proc/<pid>/cmdline and ps). The new action correctly uses CPLN_TOKEN env var. Same for CPLN_UPSTREAM_TOKEN in the promote workflow.
  • Critical: /deploy-review-app now requires author_association. The old workflow had no author_association check, meaning any GitHub user who could comment on a PR could trigger a deployment. The new check ("OWNER","MEMBER","COLLABORATOR") correctly gates this.
  • Exact command match (== '/deploy-review-app') replaces contains(). Prevents accidental triggers from comments like "I ran /deploy-review-app earlier and it worked".
  • SSH key isolation: The cpflow-build-docker-image action scopes DOCKER_BUILD_SSH_KEY to a dedicated step so it never appears in the build step's environment — even when ACTIONS_STEP_DEBUG=true is set. The EXIT trap cleans up the key file and kills the agent. Well designed.
  • Fork PR handling: Fork PRs correctly skip auto-deploy (which can't access secrets) while still allowing approved collaborators to deploy via /deploy-review-app. The cpflow-review-app-help.yml is explicit about why pull_request_target is used there (no code checkout = safe for write access from forks).

Bugs fixed

  • cancel-in-progress: false on all deploy workflows. The old staging workflow used cancel-in-progress: true, which could interrupt a cpflow deploy-image mid-rollout and leave the workload partially updated. Now fixed across all three deployment workflows.
  • Delete on PR close now uses pull_request_target. The old delete-review-app.yml used pull_request: types: [closed], which runs in a restricted context without secret access when a fork PR closes. The new workflow correctly uses pull_request_target for the close event.
  • --run-release-phase is now conditional. The old staging deploy hardcoded it unconditionally; the new workflows detect release_script: presence at runtime before deciding.
  • Delete safety uses prefix allowlist, not blocklist. The old delete-app.sh used grep -iqE '(production|staging)' to block "dangerous" names — fragile against future app naming. The new delete-app.sh requires the app name to start with REVIEW_APP_PREFIX, which is a stronger guarantee.

Potential issues

  1. cpflow exists in the deploy workflow doesn't distinguish auth errors from "not found" (see inline comment). The delete-app.sh carefully pattern-matches error output to surface API/auth failures, but the deploy workflow's check-app step treats all non-zero exits as exists=false. An intermittent auth error on a push event would silently skip the deploy.

  2. staging deploy job has no needs: validate-branch: The deploy job correctly declares needs: [validate-branch, build] and has an if: guard, but if validate-branch produces no output (e.g., first run where the variable step fails), the if: could evaluate against an empty string. In practice the validation step exits non-zero, so build never runs and deploy is skipped by the needs: dependency. This is fine but the if: guard is slightly redundant with the needs: dependency on validate-branch.

  3. Help command has no contents: read permission: cpflow-help-command.yml declares only issues: write and pull-requests: write. There's no explicit contents: read but GitHub's default GITHUB_TOKEN grants read on public repos, so this works — just worth noting that it relies on the repo's default permissions.


Nits

  • cpflow-deploy-staging.yml uses GITHUB_REF_NAME (simpler), whereas the old workflow used ${GITHUB_REF#refs/heads/} (strip-prefix approach that would silently behave oddly for tag refs). Small but cleaner.
  • The shakacode-team.md link fix (-info org URL corrected to shakacode-open-source-examples-production) is a separate correctness fix bundled in cleanly.

Overall this is well-structured with meaningful security and reliability improvements. The one item I'd prioritize is the cpflow exists error handling consistency noted in the inline comment.

if cpflow exists -a "${APP_NAME}" --org "${CPLN_ORG}"; then
echo "exists=true" >> "$GITHUB_OUTPUT"
else
echo "exists=false" >> "$GITHUB_OUTPUT"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else around cpflow exists conflates two different failure modes: "app does not exist" (expected non-zero) and "API/auth error" (unexpected non-zero). Both land in the else branch and set exists=false.

Compare delete-app.sh in this same PR, which carefully pattern-matches the output ("Double check your org", "Unknown API token format", "ERROR", etc.) to distinguish a real failure from a genuine not-found and surface it as a hard error.

If there's a transient auth issue when a PR is pushed, this step sets exists=false and the pull_request path silently writes a step-summary saying the app doesn't exist yet — masking the real problem until the deploy is manually retried.

Suggested approach: mirror delete-app.sh's pattern — capture the output, check exit code, inspect the text for known error tokens, and exit non-zero on a real failure rather than treating it as "not found".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b9095c6 by capturing cpflow exists output and failing hard on auth/API error tokens instead of treating every non-zero exit as app-not-found.


if [[ -f "${HOME}/.ssh/cpflow_build_key" ]]; then
eval "$(ssh-agent -s)"
trap 'ssh-agent -k >/dev/null; rm -f "${HOME}/.ssh/cpflow_build_key"' EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EXIT trap kills the SSH agent and removes cpflow_build_key, but the ~/.ssh/known_hosts file written in the previous step is not cleaned up here. On ephemeral GitHub-hosted runners this is harmless (the VM is discarded), but if this action is ever adopted on self-hosted runners with a persistent ~/.ssh, the custom known_hosts entries would accumulate across runs.

Consider adding rm -f "${HOME}/.ssh/known_hosts" to the trap, or using a temp file under $RUNNER_TEMP instead of ~/.ssh for the known_hosts.

default: "3.3.1"
cpflow_version:
description: cpflow gem version
required: false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI version is pinned here as a default, which is good for reproducibility. Worth noting: since sudo npm install -g fetches from the npm registry at runtime, a compromised release of @controlplane/cli@3.3.1 would silently affect all runs. Consider adding a SHA/integrity check (e.g. via npm install --integrity) or at minimum locking to a digest-pinned download, especially since this step runs in workflows that access staging/production tokens.


current_image="${selected_image:-${first_image}}"
current_version="${selected_version:-${first_version}}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The staging env-var parity check only compares names, not values. It will catch a missing DATABASE_URL in production, but it won't catch a case where production has the right key names but wrong or stale values (e.g. a staging secret was rotated but the production counterpart wasn't updated). This is expected behaviour for a name-only check — just worth documenting explicitly so it's clear what the check does and doesn't guarantee.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented in b9095c6: the production promotion workflow checks that production has the same environment variable names as staging, but it does not compare secret values.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: Replace hand-written CI/CD Actions with generated cpflow- flow*

Well-considered overhaul that hardens security, safety, and operational reliability. No blocking issues found.

Security improvements

Token no longer passed via --token flag — old setup-environment exposed it in /proc/cmdline and ps; now uses CPLN_TOKEN env var only. SSH key scoped to a dedicated step so ACTIONS_STEP_DEBUG cannot leak it from the build step. Delete command now gated on author_association (old workflow had no permission check). Fork PR protection blocks syncs before any secret is consumed. Old contains() command matching replaced with exact match.

Safety improvements

cancel-in-progress changed from true to false everywhere — old staging deploy risked partial GVC deployments mid-rollout. Prefix-based delete guard replaces the old substring check for production/staging names. Production rollback logic added with per-workload state capture and restore — old workflow had no rollback at all. Health checks now configurable (HEALTH_CHECK_RETRIES, HEALTH_CHECK_INTERVAL, HEALTH_CHECK_ACCEPTED_STATUSES).

Issue 1: Fork-PR /deploy-review-app comment silently fails with no PR feedback

When a collaborator comments /deploy-review-app on a fork PR, the job condition passes but Validate review app deployment source exits 1 with only a runner log message. No PR comment explains the rejection. Suggest posting an issues.createComment before exit 1, similar to how the unconfigured-secrets path writes to GITHUB_STEP_SUMMARY.

Issue 2: cpflow exists error detection is string-matching fragile

delete-app.sh uses case matching over stderr to distinguish app-not-found from auth/network errors. The inline TODO acknowledges this. If cpflow changes an error message, the pattern will not match and a real failure will be silently treated as app not found, causing delete to return success vacuously. The TODO to switch to a structured exit code is the right long-term fix.

Issue 3: gem install / npm install without integrity pinning

cpflow-setup-environment/action.yml installs @controlplane/cli@3.3.1 and cpflow 4.2.0 without hash verification. A re-tagged or compromised release would install silently. For production-adjacent CI, consider verifying SHA256 after install or using lockfiles with integrity entries.

Issue 4: Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs on cron with no failure notification. Silent failures accumulate stale apps over days. Consider a step gated on if: failure() that posts a GitHub issue or annotation.

Minor observations

APP_NAME / PR_NUMBER double-assignment is intentional (workflow-level env needed early for concurrency.group, step overwrites with resolved values) but a short comment would clarify intent. cpflow-review-app-help.yml using pull_request_target is correct and well-documented. deploy job in cpflow-deploy-staging.yml re-runs cpflow-setup-environment unnecessarily (only needs cpflow deploy-image) — minor redundancy, acceptable for simplicity.

echo "🗑️ Deleting application: $APP_NAME"
cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes

echo "✅ Successfully deleted application: $APP_NAME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case pattern matching over stderr is the right approach given cpflow exists doesn't expose a structured signal, and the inline TODO is the right way to track this. One additional defensive measure worth considering: log the full exists_output to the runner even when the "not found" path is taken, so there's an audit trail if unexpected output silently passed through the filter. Right now the output is printed only if non-empty, but it's easy to miss in logs.

Also worth noting: the pattern list doesn't include "timeout", "connection refused", or "ENOTFOUND" — all plausible network failure messages. If cpflow's Ruby HTTP client times out, it may surface Errno::ECONNREFUSED or Net::ReadTimeout, which would match "Net::", so that's covered. But curl-style timeouts would not. Low risk given this is a Ruby gem, but worth keeping in mind when bumping the cpflow version.

shell: bash
run: |
set -euo pipefail
cpflow cleanup-stale-apps -a "${{ vars.REVIEW_APP_PREFIX }}" --org "${{ vars.CPLN_ORG_STAGING }}" --yes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No failure notification exists for this cron job. If cpflow cleanup-stale-apps fails (auth error, API hiccup, etc.), stale review apps accumulate silently across days.

Consider appending a step:

      - name: Notify on failure
        if: failure()
        run: |
          echo "::error::Stale review app cleanup failed. Check the workflow logs."

Or, if the team monitors GitHub issues, use gh issue create to open a tracking issue on failure so it's not lost in workflow history.

} >> "$GITHUB_STEP_SUMMARY"
exit 0
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a collaborator comments /deploy-review-app on a fork PR, execution reaches this exit 1 with only a runner log message — no feedback is posted to the PR thread. The collaborator sees a red workflow run with no explanation in the PR.

Consider posting a comment before exiting. The GH_TOKEN is available from the Resolve PR ref and commit step but not inherited here; you'd need to add GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} to this step's env block. Something like:

gh api "repos/${GITHUB_REPOSITORY}/issues/${PR_NUMBER}/comments" \
  --method POST \
  --field body="Fork PRs cannot use /deploy-review-app because this workflow uses repository secrets. Ask a maintainer to deploy from a branch in the base repository." \
  --silent || true

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e46bfe: fork PR /deploy-review-app rejections now post a PR comment before exiting, while still keeping the deployment blocked because the workflow uses repository secrets.

run: |
set -euo pipefail

sudo npm install -g @controlplane/cli@${{ inputs.cpln_cli_version }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version-pinned but not hash-verified. A malicious actor who gains write access to the @controlplane/cli npm package (or re-tags a release) could substitute a compromised binary. Same applies to gem install cpflow below.

For production-adjacent pipelines, consider adding a post-install checksum check, or at minimum document the expected SHA256 for each pinned version in a comment so drift is detectable during future version bumps.

run: |
# Best-effort rollback: try every workload, aggregate failures, exit non-zero at the end
# if any failed. A single cpln hiccup shouldn't leave other workloads mid-promotion.
set -uo pipefail
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intentional omission of -e here (using set -uo pipefail instead of set -euo pipefail) is the correct design — it lets rollback continue across all workloads even if one cpln call fails, with rollback_failures tracking the aggregate. Worth a brief comment explaining the omission so a future reader doesn't add -e thinking it was forgotten:

Suggested change
set -uo pipefail
# Intentionally no -e: we want every workload to attempt rollback even if earlier ones fail.
set -uo pipefail

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e46bfe with a comment explaining that set -e is intentionally omitted so rollback can attempt every workload and aggregate failures.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Token not persisted for subsequent workflow steps
    • The composite action now writes CPLN_TOKEN to GITHUB_ENV so subsequent cpflow and cpln workflow steps inherit authentication without passing the token on the command line.

Create PR

Or push these changes by commenting:

@cursor push f1eee564c7
Preview (f1eee564c7)
diff --git a/.github/actions/cpflow-setup-environment/action.yml b/.github/actions/cpflow-setup-environment/action.yml
--- a/.github/actions/cpflow-setup-environment/action.yml
+++ b/.github/actions/cpflow-setup-environment/action.yml
@@ -63,6 +63,12 @@
           exit 1
         fi
 
+        {
+          echo "CPLN_TOKEN<<__CPLN_TOKEN__"
+          echo "$CPLN_TOKEN"
+          echo "__CPLN_TOKEN__"
+        } >> "$GITHUB_ENV"
+
         create_output=""
         if ! create_output="$(cpln profile create default --org "$ORG" 2>&1)"; then
           if ! echo "$create_output" | grep -qi "already exists"; then

You can send follow-ups to the cloud agent here.

Comment thread .github/actions/cpflow-setup-environment/action.yml
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review — PR #732: Generated cpflow GitHub Actions Flow

Overview

This is a well-structured replacement of hand-written CI/CD workflows with the generated cpflow-* composite actions and workflows. The new flows are meaningfully safer and more correct than the old ones across several dimensions. The PR is in draft but the code quality is high enough to call out specifics.


Significant Improvements ✅

  • Secret handling: Old code passed CPLN_TOKEN via --token (visible in /proc/<pid>/cmdline and ps output). New code passes it exclusively via environment — good security hygiene across all three cpln/cpflow call sites.
  • Prefix guard on deletion: delete-app.sh now validates that the app name starts with $REVIEW_APP_PREFIX- before deleting, which is a much stronger guard than the old "reject names containing 'production' or 'staging'" heuristic. The old approach would have blocked deletion of any app with those words in the name (too broad) while leaving other non-review apps unprotected (too narrow).
  • Fork PR protection: The deploy workflow explicitly detects fork PRs and blocks them from accessing repository secrets — the old workflow had no such guard.
  • Author association gating: Slash commands now check author_association is OWNER, MEMBER, or COLLABORATOR. The old delete workflow had no collaborator check at all.
  • Non-cancelling concurrency (cancel-in-progress: false): Prevents partial deployments and half-deleted apps — this was cancel-in-progress: true in the old staging workflow.
  • Multi-workload rollback: The promote workflow rolls back all workloads independently on failure, aggregating errors rather than stopping at the first failure.
  • Env-var parity check: Pre-promotion check that production has all staging env var names prevents silent nil variable errors after promotion.
  • Health checks with configurable accepted statuses: The default 200 301 302 is well-reasoned for auth-gated Rails apps; HEALTH_CHECK_ACCEPTED_STATUSES makes it overridable without editing the file.
  • Randomized heredoc delimiter in capture-current: EOF_$(openssl rand -hex 8) correctly prevents heredoc injection if rollback state ever contained a literal EOF line.

Issues

Bug: deploy job may run even when build fails in cpflow-deploy-staging.yml

In GitHub Actions, an explicit if: on a job replaces the implicit success() guard. The deploy job condition is:

if: needs.validate-branch.outputs.is_deployable == 'true'

Because this doesn't check needs.build.result, the deploy job will run even if the build job fails (as long as the branch is deployable). This burns runner minutes and produces a confusing deploy attempt against a missing image.

Fix: add && needs.build.result == 'success' to the condition.

Risk: String-matching on cpflow exists error output is fragile

delete-app.sh (and the inline version in cpflow-deploy-review-app.yml) detect "real" errors by checking for specific strings like "Double check your org", "ERROR", "Traceback", "Net::". The file already has a # TODO comment acknowledging this, but the consequence of missing a new error pattern is silently treating a real failure as "app not found" — which could skip deletion without any alert.

This is acceptable given the stated cpflow limitation, but the heuristic string list should be kept in sync with cpflow's actual output. An alternative interim approach: treat any non-zero exit with non-empty stderr as a failure unless it matches a known "not found" pattern, rather than enumerating known error patterns.

Minor: Workflow-level APP_NAME env var is redundant and overridden in cpflow-deploy-review-app.yml

The top-level env: block in the deploy review app workflow sets APP_NAME from the raw event payload, but the Resolve PR ref and commit step later overwrites it in $GITHUB_ENV with the normalized PR number. The step-level value wins, making the workflow-level declaration dead configuration. It's also the only workflow in this set that has this pattern — the others don't duplicate APP_NAME this way.


Nitpicks

  • cpflow-setup-environment/action.yml: gem install cpflow -v ... without --no-document is slower than needed on CI runners.
  • cpflow-cleanup-stale-review-apps.yml: The Validate required secrets and variables step runs as a plain shell: bash step but has no id: — this is fine, but if you ever want to condition a downstream step on it, you'll need to add one.
  • The old nightly-remove-stale-review-apps.yml hardcoded the app name (qa-react-webpack-rails-tutorial) rather than using ${{ vars.REVIEW_APP_PREFIX }}. The new workflow correctly uses the variable — worth calling out as a subtle bug fix.

Summary

The core logic is solid and the security posture is materially improved. The one real bug (deploy running after build failure) should be fixed before merge. The string-matching fragility is an acknowledged limitation with a TODO already in place.

fi
deploy_args+=(--org "${CPLN_ORG}" --verbose)

cpflow deploy-image "${deploy_args[@]}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: deploy runs even when build fails.

When a job has an explicit if:, GitHub Actions drops the implicit success() guard. Because this condition only checks validate-branch's output, the deploy job will start even if the build job failed, wasting runner minutes and attempting to deploy a missing image.

Suggested change
cpflow deploy-image "${deploy_args[@]}"
if: needs.validate-branch.outputs.is_deployable == 'true' && needs.build.result == 'success'

echo "🗑️ Deleting application: $APP_NAME"
cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes

echo "✅ Successfully deleted application: $APP_NAME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string-matching heuristic on cpflow exists error output (lines 43–54) is the main reliability risk in this file. The consequence of an unmatched error pattern is silently exiting 0 ("app not found") when the API actually returned an error, meaning the delete is skipped without any visible failure.

An alternative that inverts the logic: treat any non-zero exit with non-empty output as a failure unless it matches a known "not found" pattern:

# instead of matching known error strings, match known "not found" patterns
case "$exists_output" in
  *"not found"*|*"does not exist"*)
    echo "⚠️ Application does not exist: $APP_NAME"
    exit 0
    ;;
  *)
    echo "❌ ERROR: failed to determine whether application exists" >&2
    printf '%s\n' "$exists_output" >&2
    exit 1
    ;;
esac

This is safer by default — unknown output is treated as an error rather than a no-op. The tradeoff is that you'd need to know the exact "not found" wording from cpflow exists, which has the same coupling problem in the other direction. The TODO comment already captures this; just flagging it for awareness.

echo "ready=false" >> "$GITHUB_OUTPUT"
{
echo "Control Plane review app automation is not configured yet."
echo
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This top-level APP_NAME is computed from the raw event payload and is later overwritten in $GITHUB_ENV by the Resolve PR ref and commit step (echo "APP_NAME=..." >> "$GITHUB_ENV"). The step-level write wins, so this declaration is dead configuration and can be confusing. Consider removing it and relying solely on the step-computed value, which is already the authoritative one (it handles all three event types correctly).

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: cpflow Generated GitHub Actions Flow

This is a meaningful security and reliability upgrade over the handwritten workflows. Several issues from the old setup are fixed here; I've also flagged a few new items worth addressing.


✅ Notable Improvements

  • Token passed via env, not --token flag — prevents leakage in /proc/<pid>/cmdline and ps output (old setup-environment used cpln profile update default --org "$ORG" --token "$TOKEN")
  • Author association gating on /deploy-review-app and /delete-review-app — the old delete workflow accepted the command from any commenter
  • Fork PR protection in the deploy workflow — old workflow ran on all pull_request events without a fork check, allowing fork PRs to attempt using repo secrets
  • Prefix guard in delete-app.sh — prevents accidental deletion of non-review-app resources
  • SSH key isolated to a dedicated step — keeps it out of the main build step's env where ACTIONS_STEP_DEBUG=true would expose it
  • Randomized heredoc delimiters in the promotion workflow — prevents content injection if rollback state contains EOF
  • Old staging workflow fixed — the old deploy-to-control-plane-staging.yml triggered on branches: ['*'] (every push), burning runners on every feature branch; new workflow correctly filters to main/master
  • cancel-in-progress: false on all deployment concurrency groups — prevents partial rollout state from mid-cancelled deploys

🔴 Issues

1. cpflow-delete-review-app.yml uses pull_request_target without an explanation comment

The delete workflow triggers on pull_request_target: [closed]. This event runs with write permissions in the base repo context and can post PR comments on fork PRs. It's the correct choice here (since no PR code is checked out), but it's a well-known footgun — future maintainers may switch it to pull_request thinking it's "safer" and break fork PR cleanup. cpflow-review-app-help.yml has a good comment explaining this; the delete workflow should have one too.

2. cpflow exists error-pattern matching is duplicated

The same fragile string-matching logic ("Double check your org", "ERROR", "Traceback", "Net::", etc.) appears both in delete-app.sh and inline in the Check if review app exists step of cpflow-deploy-review-app.yml. The delete-app.sh has a TODO explaining the fragility; the workflow copy does not. If cpflow changes its error output, both need updating in sync. Consider either extracting this into a shared shell script, or at minimum adding the same TODO comment in the workflow.


🟡 Warnings

3. CPLN_TOKEN is written to GITHUB_ENV and persists to all subsequent steps

In cpflow-setup-environment/action.yml (lines 67–72), the token is written to GITHUB_ENV so later workflow steps can authenticate. This is necessary for cpflow to work, but it means the token is present in the environment of every subsequent step — including any step that might env | sort or log environment state. Consider documenting this tradeoff in a comment.

4. Production health check accepts 301/302 by default

HEALTH_CHECK_ACCEPTED_STATUSES defaults to '200 301 302'. A redirect to a login page or a broken URL returns a redirect, so the health check passes even if the app is misconfigured post-deploy. The docs note the recommended mitigation (set a dedicated /health endpoint and override to "200"). Consider making the default stricter, or adding a warning to the step summary when a redirect is accepted.

5. Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs at midnight UTC. If it fails (bad token, cpflow error, etc.), there is no alert — stale apps silently accumulate cost. Consider adding a Slack/email notification on failure, or at minimum ensure the repo has GitHub workflow failure notifications enabled for scheduled runs.


ℹ️ Informational

6. /help command has no author_association check

cpflow-help-command.yml responds to /help from any commenter (no association guard). This is probably intentional — help is informational — but on a high-volume repo the bot will reply to every /help comment from any user. Noting for awareness.

7. ruby/setup-ruby@v1 and actions/github-script@v7 use floating major versions

These will pick up any non-breaking updates automatically, which is generally fine. However, actions/checkout@v4 is also a floating major — all three could be pinned to a specific SHA for a more locked-down supply chain posture if that's a concern.

name: Delete Review App

on:
pull_request_target:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull_request_target deserves an explanation comment here.

pull_request_target runs in the context of the base repo (write permissions, access to secrets) and is required for fork PR cleanup — pull_request can't post comments on fork PRs because the token is read-only. This is safe here because no PR code is ever checked out (the actions/checkout@v4 below fetches base-branch code only).

cpflow-review-app-help.yml has a good comment explaining this exact pattern. Without a parallel comment here, a future maintainer may "fix" this to pull_request thinking it's safer, silently breaking fork PR cleanup.

Suggested change
pull_request_target:
# pull_request_target is intentional: it has write permission to comment on PRs from
# forks, where `pull_request` would be read-only. This is safe because no PR code is
# checked out — the job only calls cpflow with base-repo credentials.
# Do not switch this to `pull_request` or add a checkout of the PR ref without re-evaluating.
pull_request_target:

Comment on lines +66 to +72
# Make the token available to later workflow steps without putting it on argv.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing CPLN_TOKEN to GITHUB_ENV propagates it to every subsequent step in the calling workflow — not just the steps that explicitly use it. This is a necessary tradeoff (cpflow picks it up from the environment automatically), but it's worth documenting so future maintainers don't inadvertently add steps that log environment state.

Suggested change
# Make the token available to later workflow steps without putting it on argv.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"
# Persist CPLN_TOKEN to GITHUB_ENV so all subsequent steps can authenticate
# with cpflow/cpln without putting the token on the command line.
# Trade-off: the token is present in every subsequent step's environment.
# Do not add steps that dump env vars (e.g., `env | sort`) after this point.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"

Comment on lines +28 to +35
# TODO: replace this string-matching with a structured signal once `cpflow exists` exposes one
# (e.g. a distinct exit code for "not found" vs. API/auth errors, or `cpflow exists --json`).
# Until then, keep this list in sync if `cpflow exists` starts emitting new error patterns —
# any unmatched error string would otherwise be silently treated as "app not found".
exists_output=""
if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
case "$exists_output" in
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same cpflow exists error-pattern matching appears verbatim in the Check if review app exists step in cpflow-deploy-review-app.yml (line ~195). The TODO comment here is accurate but only covers this file — the workflow copy is uncovered.

If cpflow adds a new error pattern (e.g. "rate limit" or "timeout"), both locations need updating in sync. Consider extracting this into a shared script (e.g. .github/scripts/cpflow-app-exists.sh) that both the action and the workflow call, so the pattern list is maintained in one place.

# repo variable to tighten this for apps that expose a dedicated health endpoint
# (e.g. "200" for a plain /health, or "200 401 403" for apps that auth-gate / without
# redirecting).
HEALTH_CHECK_ACCEPTED_STATUSES: ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting 301/302 as "healthy" means a redirect to a login page, maintenance page, or even a misconfigured CDN passes the health check. The comment correctly explains why (curl without -L, Rails auth-gating /), but a redirect can pass even when the destination is broken.

Consider adding a step summary warning when a non-200 status is accepted, so operators reviewing the promotion log have visibility:

if [[ "${http_status}" == "${accepted}" ]]; then
  if [[ "${http_status}" != "200" ]]; then
    echo "::warning::Health check passed with HTTP ${http_status} (not 200) — verify the redirect destination is healthy."
  fi
  echo "healthy=true" >> "$GITHUB_OUTPUT"
  exit 0
fi

Also worth documenting in .controlplane/shakacode-team.md that setting HEALTH_CHECK_ACCEPTED_STATUSES=200 and adding a /health endpoint is the recommended production posture.

on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduled workflows that fail silently are easy to miss — GitHub does send an email for failed scheduled runs but only to the repo owner by default, and only after 7 days of consecutive failures (as of current GitHub behavior).

Consider adding a failure notification step, or at minimum documenting that the team should verify GitHub's "Notify me of failed workflow runs" notification setting is enabled. Stale review apps that aren't cleaned up accrue Control Plane costs.

Suggested change
- cron: "0 0 * * *"
- cron: "0 3 * * *" # 3 AM UTC — avoids exact midnight when many repos run scheduled jobs simultaneously

(The cron timing is a minor suggestion — staggering off midnight reduces GitHub's scheduler contention, making the run more likely to start on time.)

# Override per-repo by setting `CPLN_CLI_VERSION` / `CPFLOW_VERSION` repo variables;
# an empty input falls back to the action's pinned default below.
default_cpln_cli_version="3.3.1"
default_cpflow_version="5.0.0.rc.0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RC version as production default5.0.0.rc.0 is a release candidate. Using an RC as the fallback version in a generated template that ships into production CI pipelines is risky: RCs may contain breaking changes or regressions that won't be caught until a live deploy fails.

Before merging, please either:

  • Replace this with the stable release once 5.0.0 is published, or
  • Hold the PR until the stable release is available and pin to that.

The same RC version is also hardcoded in the bootstrap step of cpflow-deploy-review-app.yml (see comment there).

uses: ruby/setup-ruby@v1
with:
ruby-version: "3.4"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version string duplicated across two files — This bootstrap step hardcodes "5.0.0.rc.0" independently of default_cpflow_version in cpflow-setup-environment/action.yml. If one is bumped without the other, the bootstrap step will install a different cpflow version than the rest of the workflow, which could silently generate different action files.

Consider extracting the version to a single source of truth (e.g., a step output or an env var read from one file), or at minimum add a comment pointing to the companion definition in cpflow-setup-environment.

for attempt in $(seq 1 "${CPFLOW_MAX_RETRIES}"); do
echo "Health check attempt ${attempt}/${CPFLOW_MAX_RETRIES}"

workload_stderr="$(mktemp)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temp files created per loop iteration without a trapmktemp is called on every retry iteration; each file is cleaned up within its iteration (rm -f "${workload_stderr}" on both success and failure paths), so the happy path is fine. However, if the runner is killed or the script is interrupted between mktemp and the paired rm -f, the temp file leaks.

A trap registered before the loop is the standard guard:

Suggested change
workload_stderr="$(mktemp)"
workload_stderr="$(mktemp)"
trap 'rm -f "${workload_stderr}"' EXIT

Or, restructure to create one temp file before the loop and reuse it across iterations.

# deploy branches unless `cpflow generate-github-actions --staging-branch BRANCH`
# was used. If STAGING_APP_BRANCH is later changed in repository variables, keep
# this list in sync so pushes to that branch actually trigger the workflow.
branches: ["master"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push trigger can silently diverge from STAGING_APP_BRANCH variable — The on.push.branches filter is hardcoded to ["master"] (noted in the comment above). If STAGING_APP_BRANCH is later changed in repository variables to a different branch, pushes to that new branch won't trigger this workflow at all — they'll silently skip deployment with no error.

The existing comment documents the limitation well, but a runtime guard in the validate-branch job would make the misconfiguration observable. Consider adding a ::warning:: or step summary entry when GITHUB_REF_NAME matches STAGING_APP_BRANCH but doesn't match any of the hardcoded trigger branches, so operators know to update the branches: filter too.

@claude
Copy link
Copy Markdown

claude Bot commented May 8, 2026

Code Review: cpflow Generated GitHub Actions Migration

Overview

This PR replaces hand-written Control Plane GitHub Actions with a generated cpflow-* action/workflow suite (+2131/-1147). The migration is large but well-structured, with meaningful improvements in security, reliability, and operational correctness.


Security Improvements ✅

  • SSH key isolation: Private keys are loaded in a dedicated step and cleared from the environment before the Docker build step, preventing exposure in ACTIONS_STEP_DEBUG output. The comment explaining why (avoiding env-dump leakage) is a nice touch.
  • Token routing via env: All Control Plane tokens are passed as environment variables rather than CLI flags, preventing leakage via /proc/<pid>/cmdline and ps output. The old cpln profile update default --org "$ORG" --token "$TOKEN" pattern is gone.
  • Trusted source checkout first: cpflow-deploy-review-app.yml checks out the default branch's composite actions before checking out PR code under ./app, preventing a PR from replacing local actions before secrets are passed to them.
  • Prefix guard on deletion: The new delete-app.sh enforces a required REVIEW_APP_PREFIX- prefix before deleting, replacing the old production/staging grep that could be bypassed with creative app naming.
  • persist-credentials: false on all checkouts reduces the window where GITHUB_TOKEN can be extracted from .git/config.
  • Indirect bash lookup guard in cpflow-validate-config: The [A-Z_][A-Z0-9_]* name-sanitisation guard prevents injection via crafted entry names like BASH_FUNC_foo%%. Well-done.
  • Fork isolation: PRs from forks cannot auto-deploy review apps or trigger /deploy-review-app without explicit member authorization (OWNER/MEMBER/COLLABORATOR association check).

Reliability Improvements ✅

  • cancel-in-progress: false on all deployment jobs prevents partial deploys from a mid-rollout cancellation.
  • Explicit timeout-minutes on every job.
  • set -euo pipefail throughout all shell scripts.
  • Randomized heredoc delimiter for rollback state output prevents early truncation by a stray EOF inside an image tag.
  • Multi-workload rollback with per-workload failure aggregation (vs. the old single rails workload --set with a fixed 30-second sleep).
  • Parallel rollback readiness polling keeps worst-case wall time bounded regardless of workload count.

Issues to Address

1. Release Candidate as Default Version (Medium Risk — see inline comment)

default_cpflow_version="5.0.0.rc.0" is used as the pinned default in cpflow-setup-environment and hardcoded again in the bootstrap step of cpflow-deploy-review-app.yml. RC releases carry breaking-change and regression risk in production CI. This should be pinned to a stable 5.0.0 release before merging, or held until the stable release ships. See inline comments on both files.

2. STAGING_APP_BRANCH / branches: filter can silently diverge (Low Risk — see inline comment)

If STAGING_APP_BRANCH is later changed in repository variables, the push trigger won't fire for the new branch and staging deploys will silently stop. The comment in the workflow documents this, but a runtime warning would make the misconfiguration observable without requiring operators to remember to update both places.

3. /deploy-review-app comment trigger runs from the default branch

When a member comments /deploy-review-app, the workflow runs from the default branch's workflow definition (that's how issue_comment works on GitHub), not from the PR branch. The PR description and docs mention this and recommend testing with gh workflow run. Confirm this was done before merging, since slash-command logic changes in this PR won't self-test via PR comments.

4. Auto-comment noise on bot PRs (Minor)

cpflow-review-app-help.yml posts a help comment on every PR when REVIEW_APP_PREFIX is set — including Dependabot and Renovate PRs. Consider excluding well-known bot actors (github.actor != 'dependabot[bot]') to reduce noise.


Minor Nits

  • Temp files from mktemp in cpflow-wait-for-health are cleaned up within each loop iteration but not protected by a trap against unexpected interruption. See inline comment.
  • The old delete-control-plane-app/delete-app.sh checked cpflow exists and on non-zero exit just exited 0. The new version correctly checks exit code 3 for "not found" vs. other errors — this is a real correctness fix.
  • cpflow-review-app-help.yml correctly uses pull_request_target (not pull_request) so it can comment on fork PRs, and intentionally doesn't check out code. The comment explaining this is helpful.

Summary

The security hardening is thorough, the rollback logic is substantially more robust than before, and the inline comments throughout the action/workflow files are unusually well-reasoned. The primary blocker before merge is replacing the 5.0.0.rc.0 RC pin with a stable release. The rest of the feedback is advisory.

@claude
Copy link
Copy Markdown

claude Bot commented May 8, 2026

Code Review: Replace hand-written CI/CD with generated cpflow-* GitHub Actions

This is a substantial, well-structured overhaul. The generated flow introduces meaningfully better security and operational guardrails than the hand-written version it replaces. A few items to address before merging:


🐛 Bug — upstream staging token not passed to copy-image-from-upstream

In .github/workflows/cpflow-promote-staging-to-production.yml (the "Copy image from staging" step), CPLN_UPSTREAM_TOKEN is exported as an env var with a comment saying it's passed via env to avoid /proc/<pid>/cmdline leakage — but cpflow copy-image-from-upstream doesn't read CPLN_UPSTREAM_TOKEN automatically. The tool expects the upstream token via -t/--upstream-token or a recognised env var name. As written, the command runs with the production profile only and will fail when the staging and production orgs are separate service accounts. (Cursor Bugbot flagged the same issue — its proposed fix of adding -t is correct, though passing via env and then referencing ${CPLN_UPSTREAM_TOKEN} as the argument value avoids the /proc concern.)

Suggested fix:

run: |
  set -euo pipefail
  cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" \
    --upstream-token "${CPLN_UPSTREAM_TOKEN}" \
    --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}"

⚠️ Release candidate hardcoded as default version

default_cpflow_version="5.0.0.rc.0" in cpflow-setup-environment/action.yml ships a pre-release gem to every repo that doesn't set CPFLOW_VERSION. RC versions are fine to pin explicitly, but they shouldn't be the fallback for production CI/CD. Update to the latest stable cpflow release before merging; if 5.x isn't GA yet, use the latest 4.x stable.


ℹ️ Bootstrap step in cpflow-deploy-review-app.yml is fragile

The inline ruby -0pi -e 'gsub!(...)' that patches the generated cpflow-setup-environment/action.yml relies on exact string literals from the upstream generator output. If the generator's template wording changes, the gsub! calls will silently no-op and broken ${{ vars.CPLN_CLI_VERSION }} interpolation strings will end up in the committed action file. Consider removing the bootstrap mechanism entirely (the actions already exist in the repo) or replacing the regex with a more robust YAML rewrite.


ℹ️ Floating action tags vs. pinned SHAs

cpflow-setup-environment/action.yml notes that third-party actions (actions/checkout@v4, ruby/setup-ruby@v1, actions/github-script@v7) use floating major-version tags rather than immutable SHAs. The comment acknowledges this is intentional for generated templates. For a supply-chain-conscious project, consider adding a Dependabot config for Actions SHA pinning, or at minimum noting in the maintenance guide that SHA pinning should be applied when adopting these files.


✅ What's well-done

  • Token isolation: tokens are consistently passed via env: rather than as CLI flags, preventing leakage into process listings and debug logs.
  • SSH key scoping: the SSH prep step keeps DOCKER_BUILD_SSH_KEY out of the build step's environment, and cleanup_build_ssh runs on EXIT.
  • pull_request_target trust boundary: both the delete and help workflows check out the default branch, not the fork head, and the deploy workflow enforces same-repo checks before touching secrets. The inline comments explaining each decision are exactly right.
  • Prefix guard on delete: the new delete-app.sh enforces REVIEW_APP_PREFIX prefix matching rather than the old regex-blocking approach, which was easy to bypass with creative app names.
  • cancel-in-progress: false on all deployment workflows avoids partial-state bugs during concurrent pushes.
  • cpflow-validate-config: the indirect ${!name:-} lookup with a strict ^[A-Z_][A-Z0-9_]*$ name guard is a clean way to check secrets without interpolating them into log output.
  • Rollback: capturing per-workload state before promotion and issuing parallel readiness polls during rollback is a meaningful reliability improvement.

Comment on lines +275 to +282
CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }}
PRODUCTION_APP_NAME: ${{ vars.PRODUCTION_APP_NAME }}
CPLN_ORG_PRODUCTION: ${{ vars.CPLN_ORG_PRODUCTION }}
STAGING_IMAGE: ${{ steps.staging-image.outputs.image }}
shell: bash
run: |
set -euo pipefail
cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: CPLN_UPSTREAM_TOKEN is exported but never consumed by the command.

cpflow copy-image-from-upstream doesn't read CPLN_UPSTREAM_TOKEN from the environment automatically — it expects the upstream token via --upstream-token (or equivalent flag). As written, the command authenticates only with the production profile and will fail when staging and production use separate service accounts.

The comment says "pass via env rather than -t" to avoid leaking the value into /proc/<pid>/cmdline, which is the right instinct — just reference the env var as an argument value:

Suggested change
CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }}
PRODUCTION_APP_NAME: ${{ vars.PRODUCTION_APP_NAME }}
CPLN_ORG_PRODUCTION: ${{ vars.CPLN_ORG_PRODUCTION }}
STAGING_IMAGE: ${{ steps.staging-image.outputs.image }}
shell: bash
run: |
set -euo pipefail
cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}"
cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" \
--upstream-token "${CPLN_UPSTREAM_TOKEN}" \
--org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}"

This keeps the token out of the command line (it's still sourced from the env var, not a literal) while actually passing it to the tool.

# Override per-repo by setting `CPLN_CLI_VERSION` / `CPFLOW_VERSION` repo variables;
# an empty input falls back to the action's pinned default below.
default_cpln_cli_version="3.3.1"
default_cpflow_version="5.0.0.rc.0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release candidate as a production default is risky.

5.0.0.rc.0 will be the fallback for every repo that doesn't explicitly set CPFLOW_VERSION. Pre-release gems can have breaking changes or undocumented behaviour. Please update this to the latest stable cpflow release before merging, or — if 5.x isn't GA — pin to the latest 4.x stable here and upgrade the default once 5.x ships.

Suggested change
default_cpflow_version="5.0.0.rc.0"
default_cpflow_version="4.1.1"

(replace with the actual latest stable tag)

gem install cpflow -v "5.0.0.rc.0" --no-document
ruby -S cpflow generate-github-actions --staging-branch master
# shellcheck disable=SC2016
ruby -0pi -e '$_.gsub!(/so callers can pass `\$\{\{ vars\.CPLN_CLI_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally."); $_.gsub!(/so callers can pass `\$\{\{ vars\.CPFLOW_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally.")' .github/actions/cpflow-setup-environment/action.yml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile in-place string patch on a generated file.

This ruby -0pi -e 'gsub!(...)' relies on exact string literals from the upstream generator template. If the generator's wording ever changes, the substitutions silently no-op and the committed action.yml will contain raw ${{ vars.CPLN_CLI_VERSION }} / ${{ vars.CPFLOW_VERSION }} interpolation expressions, which GitHub Actions will evaluate at runtime (potentially emitting empty strings or variable reference errors).

Since the cpflow-* actions already exist in this repo, the entire bootstrap block (Set up Ruby for cpflow bootstrap + Bootstrap generated cpflow actions) is effectively dead code on every normal run — these steps only fire when the action file is missing. Consider removing the bootstrap mechanism and relying on the normal regeneration workflow documented in readme.md instead.

@claude
Copy link
Copy Markdown

claude Bot commented May 8, 2026

Code Review: Replace hand-written Control Plane Actions with generated cpflow-* flow

Overall assessment: This is a solid, well-structured replacement. The security model is thoughtfully designed — trusted-branch checkout before using secrets, fork PR gating, prefix-based delete guard, randomized heredoc delimiters, and indirect shell variable lookup for secret validation. The rollback-on-failure logic in the production promotion workflow is a particularly good addition. A few issues below to address before merging.


Security

[Medium] github-script steps interpolate step outputs directly into JS string literals

Several steps pass step-output values directly into JavaScript string literals using the expression syntax. If an output contains a double-quote (e.g. a URL with a malformed query string returned by the Control Plane API), the JS literal is broken at runtime and the comment-update step silently fails. Any value sourced from an external API (workload_url) is also a potential injection vector if the upstream service is compromised.

Affected in cpflow-deploy-review-app.yml:

  • "Update PR comment" steps: Number("${{ steps.create-comment.outputs.comment-id }}") — safe numerically, but inconsistent with the env-routing pattern used everywhere else in the PR.
  • Finalize deployment status step: steps.workload.outputs.workload_url and steps.init-deployment.outputs.result interpolated directly into JS strings.

Recommended fix: export each value as env: on the step and access via process.env.VAR_NAME in the script — the standard safe pattern for github-script.


Correctness / Reliability

[Minor] validate-branch checkout in cpflow-deploy-staging.yml missing persist-credentials: false

Every other checkout in this PR sets persist-credentials: false, but the one inside the validate-branch job does not. This leaves the GITHUB_TOKEN credential helper in .git/config, which is inconsistent and could persist longer than intended on self-hosted runners.

[Minor] cpflow-detect-release-phase exits 1 for unknown apps

If the app name (e.g. a review app like myapp-pr-42) does not match any entry in .controlplane/controlplane.yml — neither an exact key nor a match_if_app_name_starts_with prefix — the action hard-exits 1, failing the entire deployment. For the "app has no release_script" case, emitting an empty flag and continuing is safer than aborting.

[Minor] Bootstrap self-heal installs an RC gem from the internet at runtime

Lines 71–76 of cpflow-deploy-review-app.yml run gem install cpflow -v "5.0.0.rc.0" --no-document if the composite actions are missing from the default branch. RC gems can be yanked from RubyGems, making this an availability concern. The bootstrap also regenerates files into the checked-out workspace; confirm that the generated output cannot shadow any path that is later loaded as trusted code.


Supply-chain

[Minor] Third-party actions pinned to floating major-version tags, not immutable SHAs

cpflow-setup-environment/action.yml acknowledges this tradeoff in an inline comment. Surfacing it here for completeness: ruby/setup-ruby@v1, actions/checkout@v4, and actions/github-script@v7 are mutable. Dependabot or Renovate SHA-pinning mode closes this gap for repos with stricter supply-chain requirements.


Design Notes (non-blocking)

  • Health check accepts 301/302 by default. The inline comment in the promotion workflow clearly explains the tradeoff and provides a migration path (/health endpoint + HEALTH_CHECK_ACCEPTED_STATUSES override). Good documentation.
  • Same-repo PRs can modify .controlplane/controlplane.yml in the review app context. The PR code is checked out under app/ and cpflow setup-app/cpflow deploy-image read config from there. This is expected behaviour for review apps but means a PR can adjust provisioning parameters (workload sizing, env vars) without a separate config-review gate.
  • The randomised heredoc delimiter in the rollback-state capture step is an elegant fix for the GITHUB_OUTPUT multiline-value edge case.

uses: actions/github-script@v7
with:
script: |
const commentId = Number("${{ steps.create-comment.outputs.comment-id }}");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] Expression injection via external API value

workload_url is sourced from the Control Plane API (jq -r '.status.endpoint // empty') and interpolated directly into a JavaScript string literal. A double-quote in the URL breaks the JS context; a more malicious payload from a compromised upstream could execute arbitrary JavaScript in the github-script runner.

The PR already uses env-variable routing consistently in run: steps for this exact reason — the same pattern applies to github-script:

Suggested change
const commentId = Number("${{ steps.create-comment.outputs.comment-id }}");
with:
env:
DEPLOY_COMMENT_ID: ${{ steps.create-comment.outputs.comment-id }}
DEPLOY_DEPLOYMENT_ID: ${{ steps.init-deployment.outputs.result }}
DEPLOY_APP_URL: ${{ steps.workload.outputs.workload_url }}
DEPLOY_JOB_STATUS: ${{ job.status }}
script: |
const commentId = Number(process.env.DEPLOY_COMMENT_ID);
const deploymentId = process.env.DEPLOY_DEPLOYMENT_ID;
const appUrl = process.env.DEPLOY_APP_URL;
const success = process.env.DEPLOY_JOB_STATUS === "success";

- name: Update PR comment with build status
if: steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'true' && (steps.check-app.outputs.exists == 'true' || steps.setup-review-app.outcome == 'success')
uses: actions/github-script@v7
with:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Step output interpolated into JS string — inconsistent with env-routing pattern

The comment ID is a GitHub-controlled integer so there's no real injection risk here, but this is inconsistent with the env-variable routing pattern the rest of the PR applies to run: steps. The same issue recurs in the other Update PR comment steps (lines ~363). Prefer:

Suggested change
with:
const commentId = Number(process.env.CREATE_COMMENT_ID);

with env: { CREATE_COMMENT_ID: "${{ steps.create-comment.outputs.comment-id }}" } on the step.

fi

- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Missing persist-credentials: false

Every other checkout in this PR uses persist-credentials: false, but this one does not. The GITHUB_TOKEN credential helper is left in .git/config for the lifetime of the runner, which is inconsistent with the stated security intent and can persist on self-hosted runners.

Suggested change
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
with:
persist-credentials: false

end

unless app_config.is_a?(Hash)
warn "Error: app '#{app_name}' is not defined under `apps:` in `.controlplane/controlplane.yml`."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Hard exit on unrecognised app name will break review-app deployments

If the review app name (e.g. myapp-pr-42) matches neither an exact key in controlplane.yml nor a match_if_app_name_starts_with prefix, this exits 1 and fails the deploy. The intent of this action is to answer "does this app have a release script?". "App not found" is functionally the same as "no release script" — the deploy should proceed without --run-release-phase rather than abort.

Consider changing the not-found branch to:

# App not in config — assume no release script rather than failing the deploy
puts ""

and remove the exit 1.

shell: bash
run: |
set -euo pipefail
gem install cpflow -v "5.0.0.rc.0" --no-document
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Bootstrap installs an RC gem from RubyGems at workflow runtime

If the composite action files are absent from the default branch (e.g. after a force-push or accidental deletion), this fallback installs cpflow 5.0.0.rc.0 — an RC gem that can be yanked — and then regenerates the action files into the live checkout. Two concerns:

  1. Availability: a yanked RC gem will break every review-app deploy until the version in this file is updated.
  2. Trust boundary: the regenerated composite actions are written into the same workspace that subsequent steps load as trusted code. Confirm that cpflow generate-github-actions cannot overwrite any file that is later uses:-d by this workflow.

At a minimum, document that the pinned RC version here and in cpflow-setup-environment/action.yml must be kept in sync, and consider switching to a GA release once one is available.

# immutable commit SHA.
steps:
- name: Set up Ruby
uses: ruby/setup-ruby@v1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Floating action version — not pinned to an immutable SHA

The existing comment explains the tradeoff for generated templates. For repos that opt in to supply-chain hardening, replace with the corresponding immutable SHA, e.g.:

uses: ruby/setup-ruby@354a1ad156761f5ee39be78be9a1e9ee7e4dad7e  # v1.244.0

Dependabot with versioning-strategy: lockfile-only or Renovate in SHA-pinning mode will maintain these automatically.

@claude
Copy link
Copy Markdown

claude Bot commented May 8, 2026

This is a well-structured migration from hand-written workflows to generated cpflow-* GitHub Actions. The security thinking is thorough — fork gating, double-checkout pattern, persist-credentials: false, secrets via env vars, and a prefix-safety check on deletion are all done correctly.

Security

  1. Expression interpolation in github-script JavaScript: Several actions/github-script blocks interpolate ${{ steps.workload.outputs.workload_url }} directly into a JS string literal. The URL originates from an external API (cpln workload get ... | jq -r .status.endpoint) and could contain characters that break the JS string (e.g. double quotes or backticks). The safer pattern is to export the value to GITHUB_ENV first (already done for WORKFLOW_URL and CONSOLE_URL via core.exportVariable) and read it as process.env.APP_URL in JavaScript. See inline comment.

  2. Bootstrap step pins an RC version: gem install cpflow -v "5.0.0.rc.0" is only a fallback when generated action files are absent, but using a pre-release gem in a deployment pipeline adds risk (possible breaking changes, lower stability). The bootstrap step also patches the generated action with ruby -0pi inline; if the upstream output format changes, the substitution silently becomes a no-op. Recommend either committing the generated actions to the base branch so the bootstrap never runs, or pinning a stable GA release here.

Code Quality

  1. Missing persist-credentials: false in validate-branch checkout (cpflow-deploy-staging.yml line 58): Every other actions/checkout in this PR uses persist-credentials: false. The validate-branch job is the lone exception. While the job only reads action files, the inconsistency could confuse future editors and leaves the GITHUB_TOKEN in .git/config unnecessarily.

  2. Greedy strip in staging image extraction (cpflow-promote-staging-to-production.yml line 263): ${staging_image_ref##*/image/} uses the greedy ## operator. For a ref like /org/myorg/image/myapp it gives myapp (correct), but for /org/myorg/image/myapp/image/v2 it gives v2 instead of myapp/image/v2. Using the non-greedy form ${staging_image_ref#*/image/} strips only the first /image/ segment, which is the intended behavior.

  3. Health check accepts 301/302 by default for production promotions: The comments explain this tradeoff and recommend overriding with HEALTH_CHECK_ACCEPTED_STATUSES=200. Worth adding a ::notice:: annotation during the promote workflow when redirect-permissive defaults are in use, so operators are reminded at promote time.

Minor

  • cpflow-review-app-help.yml posts a comment on every newly opened PR (including bots/Dependabot) once REVIEW_APP_PREFIX is set. Consider gating on github.event.pull_request.user.type != Bot to reduce noise.
  • cpflow-help-command.yml workflow_dispatch path can post the help comment on any PR number including closed ones. Low impact but could be a UX surprise.

Overall: The PR is a significant improvement — better fork safety, structured rollback, env-var parity checks before production promotion, and prefix-guarded deletion are all good additions. The issues above are mostly low-to-medium severity, but the persist-credentials inconsistency and RC version bootstrap are easy wins to clean up before merge.

script: |
const commentId = Number("${{ steps.create-comment.outputs.comment-id }}");
const deploymentId = "${{ steps.init-deployment.outputs.result }}";
const appUrl = "${{ steps.workload.outputs.workload_url }}";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workload_url is sourced from cpln workload get ... | jq -r '.status.endpoint // empty' — an external API value. If the endpoint string contains " (or a backtick in template-literal context), it will break this JS string literal and could be exploited if the Control Plane API response is ever tampered with.

All other dynamic values here (WORKFLOW_URL, CONSOLE_URL) are already routed through core.exportVariable and read as process.env.*. Apply the same pattern:

Suggested change
const appUrl = "${{ steps.workload.outputs.workload_url }}";
const appUrl = process.env.APP_URL || "";

And add the export alongside the existing core.exportVariable calls in the "Set deployment links" step above:

core.exportVariable("APP_URL", workload_url);  // in the "Retrieve app URL" shell step, or pass via a prior github-script step

Comment on lines +71 to +78
if: ${{ hashFiles('.github/actions/cpflow-validate-config/action.yml') == '' }}
shell: bash
run: |
set -euo pipefail
gem install cpflow -v "5.0.0.rc.0" --no-document
ruby -S cpflow generate-github-actions --staging-branch master
# shellcheck disable=SC2016
ruby -0pi -e '$_.gsub!(/so callers can pass `\$\{\{ vars\.CPLN_CLI_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally."); $_.gsub!(/so callers can pass `\$\{\{ vars\.CPFLOW_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally.")' .github/actions/cpflow-setup-environment/action.yml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concerns with this bootstrap block:

  1. RC version in production CI: 5.0.0.rc.0 is a pre-release. If the actions are already committed to the base branch (the normal case), this step is skipped — but for a fresh repo setup or if the actions are accidentally deleted, this RC gem installs and generates new action files mid-run. Using a GA release (or no version pin with gem install cpflow) would be safer.

  2. Inline patching of generated files: The ruby -0pi one-liner patches cpflow-setup-environment/action.yml after generation. If the upstream template text changes in a future cpflow release, this substitution silently becomes a no-op, and the literal ${{ vars.CPLN_CLI_VERSION }} Markdown in the description won't be fixed. Consider either:

    • Committing the generated actions to the base branch so this bootstrap never runs in practice, or
    • Adding an explicit check after the substitution (grep -q "repository variable value" ... || exit 1) to fail loudly if the patch didn't apply.

Comment on lines +56 to +58
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checkout is missing persist-credentials: false, which is present on every other actions/checkout call in this PR. This leaves the GITHUB_TOKEN credential helper stored in .git/config for the duration of the job, even though nothing in this job pushes to git.

Suggested change
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
with:
persist-credentials: false

fi

if [[ "${staging_image_ref}" == /org/*/image/* ]]; then
staging_image="${staging_image_ref##*/image/}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ## (greedy) operator strips the longest prefix matching */image/. For the expected Control Plane path format /org/myorg/image/myapp this is fine, but an image ref that contains a second /image/ segment (e.g. /org/myorg/image/myapp/image/v2) would be stripped to v2 instead of myapp/image/v2.

Using the non-greedy # strips only the first /image/ occurrence, which is more predictable:

Suggested change
staging_image="${staging_image_ref##*/image/}"
staging_image="${staging_image_ref#*/image/}"

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Health check variables ignore documented repository variable overrides
    • Updated the workflow env defaults to read the matching repository variables for health check and rollback readiness settings.

Create PR

Or push these changes by commenting:

@cursor push ed231dd54e
Preview (ed231dd54e)
diff --git a/.github/workflows/cpflow-promote-staging-to-production.yml b/.github/workflows/cpflow-promote-staging-to-production.yml
--- a/.github/workflows/cpflow-promote-staging-to-production.yml
+++ b/.github/workflows/cpflow-promote-staging-to-production.yml
@@ -16,8 +16,8 @@
   # Worst-case wall time per attempt is HEALTH_CHECK_INTERVAL plus the curl --max-time below
   # (10s), so the defaults give a ~10 minute window (24 × (15 + 10) = 600s) — enough for
   # most Rails cold boots (asset precompile + db:migrate + workload readiness).
-  HEALTH_CHECK_RETRIES: 24
-  HEALTH_CHECK_INTERVAL: 15
+  HEALTH_CHECK_RETRIES: ${{ vars.HEALTH_CHECK_RETRIES || '24' }}
+  HEALTH_CHECK_INTERVAL: ${{ vars.HEALTH_CHECK_INTERVAL || '15' }}
   # Space-separated list of HTTP statuses considered healthy. The default accepts 301/302
   # because `curl` is invoked without `-L`, so a root `/` that redirects to a login page
   # (common for Rails apps that auth-gate `/`) would otherwise be reported as unhealthy
@@ -31,8 +31,8 @@
   # expose a dedicated health endpoint (e.g. "200" for a plain /health, or "200 401 403"
   # for apps that auth-gate / without redirecting).
   HEALTH_CHECK_ACCEPTED_STATUSES: ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}
-  ROLLBACK_READINESS_RETRIES: 24
-  ROLLBACK_READINESS_INTERVAL: 15
+  ROLLBACK_READINESS_RETRIES: ${{ vars.ROLLBACK_READINESS_RETRIES || '24' }}
+  ROLLBACK_READINESS_INTERVAL: ${{ vars.ROLLBACK_READINESS_INTERVAL || '15' }}
   PRIMARY_WORKLOAD: ${{ vars.PRIMARY_WORKLOAD }}
 
 concurrency:

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit ff6a820. Configure here.

# for apps that auth-gate / without redirecting).
HEALTH_CHECK_ACCEPTED_STATUSES: ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}
ROLLBACK_READINESS_RETRIES: 24
ROLLBACK_READINESS_INTERVAL: 15
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Health check variables ignore documented repository variable overrides

Medium Severity

HEALTH_CHECK_RETRIES, HEALTH_CHECK_INTERVAL, ROLLBACK_READINESS_RETRIES, and ROLLBACK_READINESS_INTERVAL are hardcoded in the workflow env block, so setting them as repository variables has no effect. This contradicts the workflow's own comment ("Override these by … setting the matching repository variable") and the documentation in cpflow-help.md and shakacode-team.md, which list all four as optional repository variables. HEALTH_CHECK_ACCEPTED_STATUSES on the same block correctly uses ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}; these four need the same pattern.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ff6a820. Configure here.

@claude
Copy link
Copy Markdown

claude Bot commented May 8, 2026

Code Review: Replace hand-written GitHub Actions with generated cpflow-* flow

Overview

This PR replaces five hand-written Control Plane GitHub Actions workflows/actions with generated cpflow-* equivalents. The generated code is substantially better than what it replaces in security, reliability, and correctness.


Security Improvements (significant wins)

  • Token leak fix: Old setup-environment passed --token "$TOKEN" as a CLI argument, exposing it in /proc/<pid>/cmdline and ps output. New code uses the CPLN_TOKEN env var instead.
  • eval elimination: Old workflows used eval(process.env.GET_CONSOLE_LINK) in github-script steps — a serious code-injection vector if any env var was influenced by user input. Removed entirely.
  • Exact-match slash commands: Old code used contains(github.event.comment.body, '/deploy-review-app') — a comment like please /deploy-review-app now would have triggered a deployment. New code uses strict equality.
  • Prefix-guarded app deletion: Old delete-app.sh blocked names containing production or staging. New code checks against a strict REVIEW_APP_PREFIX- prefix, which is both safer and more explicit.
  • Bash nameref guard: cpflow-validate-config validates indirect variable names against ^[A-Z_][A-Z0-9_]*$ before ${!name} expansion, preventing BASH_FUNC_foo%%-style injection.
  • SSH key isolation: SSH key is written in a dedicated step so DOCKER_BUILD_SSH_KEY never appears in the build step environment (relevant when ACTIONS_STEP_DEBUG=true).
  • persist-credentials: false: Now consistently set on checkout steps to keep GITHUB_TOKEN out of .git/config.
  • Fork PR handling: Properly detects fork PRs and skips secret-dependent steps rather than silently passing secrets to untrusted code.

Issues and Concerns

1. Release candidate in production default (medium risk)

In .github/actions/cpflow-setup-environment/action.yml:

default_cpflow_version="5.0.0.rc.0"

A release candidate ships as the default for a production-critical deployment tool. If 5.0.0.rc.0 contains a bug, all repos inheriting the default are affected. Consider pinning the stable release before merge.

2. Fragile bootstrap-and-patch step (maintainability concern)

In cpflow-deploy-review-app.yml, there is a bootstrap step that reinstalls cpflow and regenerates actions if cpflow-validate-config/action.yml is missing — followed by an inline ruby -0pi -e patch that regex-substitutes specific strings in the generated output. This is brittle: if the upstream template changes the text being patched, the substitution silently no-ops and the generated file drifts. The bootstrap step is also unnecessary for a repo that has already committed the generated files. Consider removing the bootstrap and replacing it with a CI check that detects generated-file drift instead.

3. validate-branch checkout missing persist-credentials: false

In cpflow-deploy-staging.yml, the validate-branch job checks out the repo without persist-credentials: false, while the subsequent build and deploy jobs both set it. Minor inconsistency worth fixing for auditability.

4. Health check defaults accept redirects (documented footgun)

The default HEALTH_CHECK_ACCEPTED_STATUSES: '200 301 302' will pass a maintenance-mode redirect or an auth-gate redirect as healthy. The code contains a good inline comment recommending a dedicated /health endpoint. Consider surfacing this recommendation in cpflow-help.md so teams see it before an incident.

5. Rollback does not reverse DB migrations

Documented correctly in shakacode-team.md. Worth adding: a warning in the promotion workflow job summary when --run-release-phase was used and rollback was triggered, so operators know to check DB state manually.

6. Rollback readiness polling inconsistency

"Wait for deployment health" uses the cpflow-wait-for-health composite action (HTTP endpoint poll). "Wait for rollback readiness" rolls its own loop polling workload.status.ready. These check different things — a workload can report ready: true while still returning HTTP errors. Consider reusing cpflow-wait-for-health for rollback readiness too.

7. CONTRIBUTOR not in author_association allowlist

Both deploy and delete workflows gate slash commands on ["OWNER","MEMBER","COLLABORATOR"]. CONTRIBUTOR (GitHub's association for someone whose PR was previously merged) is excluded. This may be intentional, but worth confirming if past contributors should be able to trigger review apps.


Minor observations

  • Two overlapping help workflows: cpflow-help-command.yml (responds to /help comment) and cpflow-review-app-help.yml (auto-comments on PR open) both serve useful purposes but the auto-comment may be noisy on high-volume repos.
  • cpflow-review-app-help.yml missing contents: read: The workflow omits contents: read from permissions:. Explicit is better for auditability.
  • Nightly cleanup correctness: cpflow cleanup-stale-apps -a "${REVIEW_APP_PREFIX}" — verify that cpflow interprets this as a prefix match and not an exact app name; otherwise no stale apps will ever be deleted.

Removed issues worth noting

The old code had several bugs now fixed:

  • cpflow exists -a ${{ env.APP_NAME }} without quoting — word-split risk on unusual app names
  • sudo npm install -g — fails on self-hosted runners without sudo; new code uses a user-local npm prefix
  • ${{ github.event_name }} interpolated directly into a case statement inside run: — context injection vector
  • eval(process.env.GET_CONSOLE_LINK) in github-script — dangerous arbitrary code execution from an env var
  • Workflow URL construction via listJobsForWorkflowRun could return the wrong job under concurrency

Summary

This is a well-executed, security-positive replacement. The main concerns before moving out of draft are the 5.0.0.rc.0 release candidate default and the fragile bootstrap-and-patch step. The other items are improvements worth addressing but not blockers.

# Override per-repo by setting `CPLN_CLI_VERSION` / `CPFLOW_VERSION` repo variables;
# an empty input falls back to the action's pinned default below.
default_cpln_cli_version="3.3.1"
default_cpflow_version="5.0.0.rc.0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pins a release candidate as the default for a production-critical deployment tool. If 5.0.0.rc.0 contains a bug, every repo that inherits this default is affected. Please update to the stable release before merge (or document a concrete plan to do so before this workflow runs on master).

Comment on lines +70 to +78
- name: Bootstrap generated cpflow actions
if: ${{ hashFiles('.github/actions/cpflow-validate-config/action.yml') == '' }}
shell: bash
run: |
set -euo pipefail
gem install cpflow -v "5.0.0.rc.0" --no-document
ruby -S cpflow generate-github-actions --staging-branch master
# shellcheck disable=SC2016
ruby -0pi -e '$_.gsub!(/so callers can pass `\$\{\{ vars\.CPLN_CLI_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally."); $_.gsub!(/so callers can pass `\$\{\{ vars\.CPFLOW_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally.")' .github/actions/cpflow-setup-environment/action.yml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bootstrap block has two concerns:

  1. It should never run for this repo: cpflow-validate-config/action.yml is committed here, so hashFiles(...) will always return a non-empty string. The block is dead code for this repo and only matters for repos that haven't yet committed the generated files.

  2. The ruby -0pi -e patch is very fragile: It regex-substitutes specific hardcoded strings in the generated cpflow-setup-environment/action.yml. If the upstream template changes that wording, the substitution silently no-ops and the file drifts undetected.

Recommend either removing the bootstrap entirely (since the generated files are committed) or replacing it with a CI drift-detection step that fails loudly if the committed generated files differ from what cpflow generate-github-actions would produce today.

Comment on lines +56 to +58
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build and deploy jobs both set persist-credentials: false on their checkout steps, but this validate-branch checkout does not. For consistency and to keep GITHUB_TOKEN out of .git/config, add persist-credentials: false here too.

Suggested change
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
with:
persist-credentials: false

(github.event_name == 'issue_comment' &&
github.event.issue.pull_request &&
github.event.comment.body == '/deploy-review-app' &&
contains(fromJson('["OWNER","MEMBER","COLLABORATOR"]'), github.event.comment.author_association))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allowlist covers OWNER, MEMBER, and COLLABORATOR, but not CONTRIBUTOR. GitHub assigns CONTRIBUTOR to anyone whose PR has previously been merged into the repo. Depending on the project's intent, CONTRIBUTOR may be a reasonable addition so past contributors can deploy review apps without needing explicit collaborator access. Worth confirming whether this is intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant